Week 1

Activities

  • Completed DataCamp assessments of R programming.

  • Finished GitHub tutorial on version control and collaboration.

  • Read Analyzing US Census Data by Kyle Walker.

  • Did exploratory analysis and visualization on M&M Candy data set with newly gained skills.

  • Learned census analysis using tidycensus instructed in workshops, and did an exercise to find the county in Iowa with the lowest median household income based on the 2020 American Community Survey.
var <- load_variables(2020, "acs5", cache = TRUE)
# View(var)

ia <- get_acs(
  geography = "county",
  variables = c(medincome = "B19013_001"),
  state = "IA",
  year = 2020
)

iaRearranged <- ia %>%
  arrange(estimate) %>%
  head(5)

iaRearranged$NAME[iaRearranged$estimate == max(iaRearranged$estimate)]
## [1] "Decatur County, Iowa"

Skills Acquisition

Bookmarks

Week 2

Activities

  • Discussed the scope of the projects with the teams and did some groundwork to get familiar with the projects.
    • The DHR Disability project seeks to understand the full range and complexity of Iowa’s disability community. I have improved my understanding of undeserved and underrepresented populations (deaf and hard of hearing persons, persons with disabilities, women) in Iowa by reviewing literature related to the research topic.
    • Community Visioning is a survey-based transportation enhancement plan. On the project’s opening day, I explored the median age of Iowa communities participating in the 2006 Visioning Survey to see the distribution of age across cities.
  • Worked with the A-Day team and demonstrated the contrast and comparison of income (Median Household, Median Family, and Per capita), age and race demographics, primary occupations by number, and top agricultural products between Illinois and South Dakota.
ilIncome <- get_acs(geography = "state",
              variables = c(medHousehold = "B19013_001", medFamily = "B19113_001", Capita = "B19301_001"),
              state = "IL")
## Getting data from the 2016-2020 5-year ACS
sdIncome <- get_acs(geography = "state",
              variables = c(medHousehold = "B19013_001", medFamily = "B19113_001", Capita = "B19301_001"),
              state = "SD")
## Getting data from the 2016-2020 5-year ACS
Income <- full_join(ilIncome, sdIncome)
## Joining, by = c("GEOID", "NAME", "variable", "estimate", "moe")
# Barplot for incomes in Illinois and South Dakota
ggplot(Income, aes(x = NAME, weight = estimate)) +
  geom_bar() +
  facet_grid(~ variable) +
  labs(title = "Median Income in the Past 12 Months in 2020 Inflation-adjusted Dollars",
       subtitle = "2016-2020 American Community Survey",
       x = "State",
       y = "ACS Estimated Income") +
  scale_y_continuous(labels=scales::dollar_format(), limits = c(0, 90000)) +
  geom_errorbar(aes(ymin = estimate - moe,
                   ymax = estimate + moe,
                   width = 0.5,
                   color = "Margin of error")) +
  geom_text(aes(y = estimate,
                label = sprintf("$%0.0f", estimate),
                vjust = -1))

ilOcc <- get_acs(geography = "state",
              variables = c(Management_business_science_arts = "C24050_015",
                            Service = "C24050_029",
                            Sales_office = "C24050_043",
                            NaturalResources_construction_maintenance = "C24050_057",
                            ProductionTransportation_materialMoving = "C24050_071"),
              state = "IL")
## Getting data from the 2016-2020 5-year ACS
ilOcc <- ilOcc %>% rename(occupation = variable) %>%
  mutate(percent = round(estimate / sum(estimate) * 100, 2))

sdOcc <- get_acs(geography = "state",
              variables = c(Management_business_science_arts = "C24050_015",
                            Service = "C24050_029",
                            Sales_office = "C24050_043",
                            NaturalResources_construction_maintenance = "C24050_057",
                            ProductionTransportation_materialMoving = "C24050_071"),
              state = "SD")
## Getting data from the 2016-2020 5-year ACS
sdOcc <- sdOcc %>% rename(occupation = variable) %>%
  mutate(percent = round(estimate / sum(estimate) * 100, 2))

Occupation <- full_join(ilOcc, sdOcc)
## Joining, by = c("GEOID", "NAME", "occupation", "estimate", "moe", "percent")
# Barplot for occupations in Illinois and South Dakota
ggplot(Occupation, aes(x = NAME, weight = estimate)) +
  geom_bar() +
  facet_grid(~ occupation) +
  labs(title = "Industry by Occupation for the Civilian  Employed Population 16 Years and Over",
       subtitle = "2015-2020 American Community Survey",
       x = "State",
       y = "ACS Estimated Population") +
  geom_errorbar(aes(ymin = estimate - moe,
                   ymax = estimate + moe,
                   width = 0.5,
                   color = "Margin of error")) +
  geom_text(aes(y = estimate,
                label = estimate,
                vjust = -1))

# Pie chart for occupations in Illinois
ggplot(ilOcc, aes(x = "", y = estimate, fill = occupation)) +
  geom_bar(stat = "identity") +
  coord_polar("y", start = 0) +
  labs(title = "Industry by Occupation in Illinois",
       subtitle = "2015-2020 American Community Survey") +
  geom_text(aes(label = paste0(percent, "%")),
            position = position_stack(vjust = 0.5)) +
  theme_void()

# Pie chart for occupations in South Dakota
ggplot(sdOcc, aes(x = "", y = estimate, fill = occupation)) +
  geom_bar(stat = "identity") +
  coord_polar("y", start = 0) +
  labs(title = "Industry by Occupation in South Dakota",
       subtitle = "2015-2020 American Community Survey") +
  geom_text(aes(label = paste0(percent, "%")),
            position = position_stack(vjust = 0.5)) +
  theme_void()

Skills Acquisition

  • Point and polygon data Visualizing Geospatial Data in R
    • Two ways to access slots in an S4 object:
      x@slot_name
      slot(x, "slot_name")
    • Use double bracket subsetting (i.e. [[...]]) to extract an element in a slot.
    • $ and [[ subsetting on a Spatial___DataFrame pulls columns directly from the data frame. That is, if x is a Spatial___DataFrame object, then either x$col_name or x[["col_name"]] pulls out the col_name column from the data frame.
    • Create a logical from a column, let’s say countries in Asia: in_asia <- countries_spdf$region == "Asia"
      Then, use the logical to select rows of the Spatial___DataFrame object: countries_spdf[in_asia, ]
    • ggplot2 expects data in data frames, tmap expects data in spatial objects.
    • Changing the projection of a ggplot2 plot: using the coord_map() function.
      In tmap, tm_shape() takes an argument projection that allows you to swap projections for the plot.
    • tmap_save() saves tmap plot to a file. E.g. The extension of the file name specifies the file type, for example .png or .pdf for static plots. .html can save an interactive version which leverages the leaflet package.
  • get_estimates gets data from the US Census Bureau Population Estimates APIs.

Week 3

Activities

Skills Acquisition

Bookmarks